A Joint Segmenting and Labeling Approach for Chinese Lexical Analysis

نویسندگان

  • Xinhao Wang
  • Jiazhong Nie
  • Dingsheng Luo
  • Xihong Wu
چکیده

This paper introduces an approach which jointly performs a cascade of segmentation and labeling subtasks for Chinese lexical analysis, including word segmentation, named entity recognition and partof-speech tagging. Unlike the traditional pipeline manner, the cascaded subtasks are conducted in a single step simultaneously, therefore error propagation could be avoided and the information could be shared among multi-level subtasks. In this approach, Weighted Finite State Transducers (WFSTs) are adopted. Within the unified framework of WFSTs, the models for each subtask are represented and then combined into a single one. Thereby, through one-pass decoding the joint optimal outputs for multi-level processes will be reached. The experimental results show the effectiveness of the presented joint processing approach, which significantly outperforms the traditional method in pipeline style.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The effects of task complexity on Chinese learners’ language production: A synthesis and meta-analysis

The  present  meta-analysis  was  conducted  to  provide  a  quantitative  measure  of  the overall effects of task complexity on Chinese EFL learners’ language  production.  Based  on  the strict inclusion criteria, 12 primary  studies  were synthesized according to key  features.  Eleven of them  were  meta-analyzed  to  investigate  effects  of  raising  the  resource-directing  task  comple...

متن کامل

Improving Chunk-based Semantic Role Labeling with Lexical Features

We present an approach for Semantic Role Labeling (SRL) using Conditional Random Fields in a joint identification/classification step. The approach is based on shallow syntactic information (chunks) and a number of lexicalized features such as selectional preferences and automatically inferred similar words, extracted using lexical databases and distributional similarity metrics. We use semanti...

متن کامل

Exploring Impacts of Consciousness-raising in a Genre-based Pedagogy

This study reports on the findings of a genre teaching course for developing academic writing of a class of EFL students in Iran. The information report genre was taught in a cyclical way of teaching and learning, which was started from ‘setting the context’ and ‘deconstruction’ of prototype information report genre, and continued with ‘joint construction’, ‘independent construction’, and final...

متن کامل

Examining the Effect of Ideology and Idiosyncrasy on Lexical Choices in Translation Studies within the CDA Framework

Using a critical discourse analytic model of translation criticism, the present study attempts to explore the effect of ideology and idiosyncrasy on the lexical choices in translation studies. The study employed a descriptive approach to answer two research questions: Is there any relationship between ideology and idiosyncratic features of translators' lexical choices? And if yes, can it be ana...

متن کامل

A Heuristic Method for Chinese Segmentation

Research and development in digital library includes content creation, conversion, indexing, organization, and dissemination, where the key technological issues are how to search and display desired selections from and across large collections effectively [10]. A repository is an indexed collection of objects. Indexing is an important task for searching. The better the indexing, the better the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008